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Abstract 

Background: Heterococcus is a microalgal genus of Xanthophyceae (Stramenopiles) that is common and 
widespread in soils, especially from cold regions. Species are characterized by extensively branched filaments 
produced when grown on agarized culture medium. Despite the large number of species described exclusively 
using light microscopic morphology, the assessment of species diversity is hampered by extensive morphological 
plasticity. 

Results: Two independent types of molecular data, the chloroplast-encoded psbNrbcV spacer complemented by 
rbcL gene and the internal transcribed spacer 2 of the nuclear rDNA cistron (ITS2), congruently recovered a robust 
phylogenetic structure. With ITS2 considerable sequence and secondary structure divergence existed among the 
eight species, but a combined sequence and secondary structure phylogenetic analysis confined to helix II of ITS2 
corroborated relationships as inferred from the rbcL gene phylogeny. Intra-genomic divergence of ITS2 sequences 
was revealed in many strains. The 'monophyletic species concept', appropriate for microalgae without known sexual 
reproduction, revealed eight different species. Species boundaries established using the molecular-based 
monophyletic species concept were more conservative than the traditional morphological species concept. Within 
a species, almost identical chloroplast marker sequences (genotypes) were repeatedly recovered from strains of 
different origins. At least two species had widespread geographical distributions; however, within a given species, 
genotypes recovered from Antarctic strains were distinct from those in temperate habitats. Furthermore, the 
sequence diversity may correspond to adaptation to different types of habitats or climates. 

Conclusions: We established a method and a reference data base for the unambiguous identification of species of 
the common soil microalgal genus Heterococcus which uses DNA sequence variation in markers from plastid and 
nuclear genomes. The molecular data were more reliable and more conservative than morphological data. 
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Background 

Heterococcus is a genus of yellow-green microalgae 
(Xanthophyceae, Stramenopiles) that is common and 
widespread in soils of cold regions such as the Alps or 
Antarctica [1,2]. In addition to soils, three species have 
been reported from freshwater [3-6], and Heterococcus is 
the only xanthophyte known from lichen symbiosis [7,8]. 
Heterococcus produces extensively branched filaments 
when grown on agarized culture medium (Figure 1); 
however, in field samples it produces unicellular coccoid 
cells that are weakly connected. Perhaps uniquely for 
microalgal genera, all species have been described based 
upon isolates grown in culture and observed with a light 
microscope [1,2,6]. Without culturing, Heterococcus is 
often mistaken for other coccoid xanthophytes, eustig- 
matophytes or green algae. Sixty-one Heterococcus spe- 
cies have been described [9], and 51 species are 
recognized [10]. Extensive ultrastructural observations 
were undertaken by Lokhorst [2], but he reluctantly con- 
cluded that ultrastructural features were not sufficient to 
distinguish species. 

Sexual reproduction is unknown for Heterococcus, and 
therefore the biological species concept cannot be 
employed (e.g. [11]); only the morphological (typo- 
logical) species concept has been used. That is, Hetero- 
coccus species identity is limited to light microscopic 
morphological characters interpreted within the exten- 
sive plasticity that is exhibited during culture studies 
[1,2,6]. For example, branching patterns are not present 
in very young or old cultures, and filament formation is 
suppressed (coccoid cells are produced) when cultures 
are grown at suboptimal temperature ranges [1] 
(Figure 1). Cladistic analysis of these morphological fea- 
tures would be extremely difficult because cell sizes, 
branching patterns, colony growth, chloroplast number 
and other features overlap extensively among the spe- 
cies, even when grown under optimum conditions. 

Molecular phylogenetic analysis is often a reliable al- 
ternative for identification of species; however, species 
diversity of Heterococcus using molecular markers was 




unstudied and no molecular reference data base existed. 
From only seven Heterococcus species DNA sequences 
had previously been reported, and all these sequences 
were from conserved molecular markers. The sequences 
revealed the probable monophyletic origin of the genus 
and its basal position within the Xanthophyceae, 
which was distinct from other filamentous members 
(e.g. Tribonema, Vaucheria) [12-15]. We used molecu- 
lar phylogenetics, especially within the framework of the 
monophyletic species concept [16-18], to evaluate 33 
culture strains identified as Heterococcus (Figure 2). 
Fourteen strains were originally identified to species 
level using morphology, and ten of those strains were 
authentic culture strains, i.e. the culture strains used to 
describe the species [1,3-5,19]. Unfortunately, the cul- 
tures used to describe all other species have been lost. 
For nine authentic strains, there are extended morpho- 
logical descriptions with numerous illustrations pro- 
duced by two independent authors [2-5]. We added 19 
unidentified culture isolates, including twelve cultures 
recently isolated. Our goals were (1) to test boundaries 
of Heterococcus species as inferred from morphological 
features and (2) to establish a reference data base of 
strains unambiguously distinguished with DNA se- 
quence data. We chose two highly variable molecular 
markers, i.e. the chloroplast-encoded psbA/rbcL spacer 
region [20,21] and the nuclear-encoded internal tran- 
scribed spacer 2 of the nuclear rDNA cistron [22-24], to 
examine species boundaries. We also determined full 
plastid-encoded rbcL gene sequences to infer the phylo- 
genetic position of species. 

Results 

Four of the strains, identified as Heterococcus, were 
green algae (Figure 2). These were not included in the 
rest of the study. The rbcL gene sequences were used to 
assess the phylogenetic relationships of the remaining 29 
strains (Figure 3, Additional file 1). For 25 strains, PCR 
amplification was successful for the whole region from 
psbA (downstream), through the rbcL, through the rbcLI 




Figure 1 Morphology of three strains of Heterococcus viridis in culture. (A) Prostrate colonies produced by branched filaments on the 
surface of an agarized culture, 16 weeks old (strain B10). (B) Enlarged view of a young (4 weeks old) colony, liquid culture, strain SAG 835-7. 
(C) Enlarged filament, 6 week old agarized culture (strain MZ3-7). (D) Coccoid cells in a 4 weeks old liquid culture (strain SAG 835-7). Scale bar in 
(A) 500 Mm, in (B) - (D) 20 urn. 
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rbc L clade 
or lineage 



H. caespitosus 



psb A/ rbc L spacer 
and ITS2 group 



New species 
designation 



Origin 



SAG 835-3 H. viridis * 

SAG 835-6 H. mainxii * 
SAG 835-7 H. marietanii* 
SAG 835-1 H. brevicellularis * 
SAG 835-8 H. moniliformis * 
SAG 56.94 H. sp. 
EIF 398 H. sp. 

EIF PAB 398/473 H. pleurococcoides 

EIF430/A801-2H. sp. 

EIF PAB 397/380 H. pleurococcoides 

MZ2-4H. sp. a 

MZ2-5 H. sp. a 

BIO H. sp. a 

SAG 2162 H. sp. a 

MZ3-7H. sp. a 



MZ1-3 H. sp. 

MZ1-6H. sp. a 
DB14-15 H. sp. a 
DB15-5 H. sp. a 



SAG 2163 H. sp. a 



H. virginis 



EIF423/A790-45H. sp. 

EIF PAB 399/372 H. caespitosus 

EIF 128/A788-70 H. protonematoides 



SAG 835-2a H. caespitosus * 

SAG 835-9 H. protonematoides * 



H. leptosiroides 











H. sp. 


F 


DB14-1-1 H. sp. a 

DB14-5-1 H. sp. a 




H. ramosissimus 










H. crassulus 




SAG 835-4 H. crassulus * | 


H. crassulus 










H. fuornensis 


n.a. 1 


SAG 835-5 H. fuornensis * \ 


H. fuornensis 










n.a. 2 


n.a. 2 


SAG 63.90 H. endolithicus * 




Desmococcus 
antarctica 



Switzerland, freshwater 
Czech Republic, freshwater 
Switzerland, freshwater 
Switzerland, soil 
Switzerland, soil 
Germany, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Antarctica, soil 
Germany, freshwater 
Germany, freshwater 
Antarctica, soil 

Antarctica, soil 
Antarctica, soil 
Antarctica, rock surface 
Germany, soil 
Switzerland, rock surface 

Germany, freshwater 
Germany, freshwater 



Switzerland, soil 



Antarctica, soil 



EIF434/A801-133 H. sp. 
ElF446/A812-63a H. sp. 
EIF447/A834-545 H. sp. 



Demococcus- Antarctica, soil 
like green algae Antarctica, soil 
Antarctica, soil 



Figure 2 The 33 strains identified as Heterococcus used in this study. The Heterococcus strains are listed with their species names (where 
provided) from previous morphological analyses, their assignment to clades and lineages in the rbcl phylogeny (boxed with thick lines; see 
Figure 3), their assignments to a certain species recognized in this study (boxed with thin lines), their new species designations (see Discussion) 
and their geographic origin. Highlighted in green are genotypes, i.e. groups of strains exhibiting high sequence similarities (see text). Strains in 
bold letters represent cryopreserved epitypes (reference strains) designated for each species (see Discussion). An asterisk marks an authentic 
reference strain (see text). a marks those strains that have recently been isolated by us or were provided to us for this study; n.a. 1 , not applicable 
because the psbA/rbcL spacer sequence could not be determined (see text); n.a. 2 , not applicable because strains were identified as green algae 
(see text). 



rbcS spacer and to the rbcS gene; therefore the full rbcL 
gene, 1467 base pairs long, was determined (Additional 
file 2). We failed to obtain full rbcL sequences for three 
authentic strains, Heterococcus fuornensis Vischer strain 
SAG 835-5, H. caespitosus Vischer strain SAG 835-2a, 
and H. protonematoides Vischer strain SAG 835-9, but 
we used available sequences (AM421004, AM421002 
and AJ579575) for these three strains. Also, for strains 
DB14-15 and MZ1-6 the full rbcL failed to amplify. Fif- 
teen different rbcL sequences were recovered among the 
29 strains, which implies that the rbcL gene was identi- 
cal among many strains (Additional file 3). Only the 15 
different rbcL sequences were used for phylogenetic ana- 
lyses (Figure 3, Additional file 1). Monophyly of Hetero- 
coccus was highly supported with all methods except 
maximum likelihood, and this confirmed the generic 
identity of the 29 strains. The analyses resolved two well 
supported clades, named "H caespitosus clade" and "H. 



viridis clade". In addition, there were three independent 
lineages representing H crassulus Vischer, H fuornensis 
and an unidentified strain ("H sp."). Relationships 
among the clades and lineages remained ambiguous 
(Figure 3, Additional file 1). 

psbJk/rbcl. spacer 

To further examine the relationships, the psbA/rbcL spa- 
cer sequences were determined for 28 strains (H fuor- 
nensis strain SAG 835-5 failed to amplify). The spacers 
varied greatly in length and primary sequences; the 
sequences could not be aligned across all strains. Never- 
theless, two short sequence stretches were aligned across 
all strains. The first was 23 nucleotides at the 5'-end 
(pos. 78-99 of reference sequence H viridis Chodat 
strain SAG 835-3, JX681220) and the second was 36 
nucleotides at the 3'-end (pos. 312 - 347, same reference 
sequence). 
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EIFPAB 398/473 H. pleurococcoides 
SAG 2162 Heterococcus sp. 
BIO Heterococcus sp. 
SAG 835-3 Heterococcus viridis 
SAG 835-7 Heterococcus marietanii 
MZ3-7 Heterococcus sp. 
SAG 835-1 Heterococcus brevicellularis 
DB15-5 Heterococcus sp. 

O 

MZ1-3 Heterococcus sp. 
SAG 2163 Heterococcus sp. W 

EIF128/A788-70 R protonematoides Q 
SAG 835-2a H. caespitosus AM421002 ^ 
DB14-1-1 Heterococcus sp. 

SAG 835-5 H. fuornensis AM421004 



4 



SAG 835-4 Heterococcus crassulus 



H. wr/cf/s clade 



H. caespitosus clade 
H. sp. 

H. fuornensis 
H. crassulus 



Figure 3 Maximum likelihood (ML) phylogeny of rbcl gene sequences for 15 Heterococcus strains. Twelve other strains had sequences 
identical to one of the 15 shown (Additional file 3). Sequences without accession numbers are reported for the first time. Sequence names 
highlighted in blue indicate authentic reference strains; names used in the tree are those used to identify the original cultures (see text). Capital 
letters in filled blue circles indicate the seven species apart from H. fuornensis resolved by the psbA/rbcl spacer and ITS2 sequence analyses (see 
text). The names next to the tree represent clades and lineages recovered in the phylogenetic analyses. Thick lines indicate internal branches 
resolved by maximum likelihood, maximum parsimony, minimum evolution distance and Bayesian analyses and with significant statistical support 
(bootstrap >95%, posterior probability = 1.0). Black filled circle marks the branch indicating the monophyletic origin of Heterococcus that was 
significantly supported (bootstrap >95%, posterior probability = 1.0) except for the maximum likelihood analyses. The phylogeny shown is part of 
a larger ML phylogeny (calculated with GARLI v0.96 [25,26]) based on a rbcl data set (1325 bp long, 517/418 variable/parsimony informative sites) 
consisting of 15 Heterococcus sequences and 32 other Xanthophyceae sequences corresponding to clades C, B, T, and V as defined in [14] (see 
Additional file 1) as well as two outgroup taxa. Scale bar, substitutions per site. 



In most Heterococcus strains the nucleotide length of 
the psbA/rbcL spacer ranged from 275 nucleotides (H 
caespitosus strain SAG 835-2a,) to 289 nucleotides (H 
sp. strain DB14-15). The sequence for H crassulus strain 
SAG 835-4 was 1762 nucleotides, and the identical 
sequences for two strains, DB14-1-1 and DB14-5-1, were 
2143 nucleotides. Sequence similarities further down- 
stream grouped the strains into seven "spacer groups", 
A - G, within which the psbA/rbcL spacers were identi- 
cal or displayed only very few differences (Figure 2, 
Additional file 3). When mapped on the rbcL phylogeny, the 
strains of spacer groups A, B and C were included in the 
H. viridis clade, strains of spacer groups D and E fell in 
the H caespitosus clade, and spacer groups F and G 
represented the lineages "H. sp." and H. crassulus 
(Figure 3). 

Between closely related groups or within a group, also 
other regions of the psbA/rbcL spacer sequences could be 
aligned. For example, strains of the Heterococcus viridis 
clade (groups A-C) had sequence regions that aligned 
well, but there were up to 28 nucleotide differences 
among them. In addition, there was a hypervariable region 
of different lengths (20-31 nucleotides, between pos. 172 
and 193 of the reference sequence H viridis SAG 835-3, 
JX681220) that was not alignable among the three groups, 
but clearly distinguished them from each other. In the H 
caespitosus clade, i.e. between groups D and E, the psbhl 



rbcL spacers also aligned well over the entire lengths, but 
differed at 14 sequence positions and a single indel. Simi- 
larly, there was a maximum of 13 psbA/rbcL spacer se- 
quence differences between strains of group A. In group 
A there were nine strains isolated from Antarctica 
(Figure 2). There were no more than two nucleotides dif- 
ference among them when Antarctic strain MZ3-7 was 
not considered and the previously unidentified strain SAG 
56.94, isolated from Germany, had just one to three se- 
quence differences with the eight Antarctic isolates. Con- 
versely, strain MZ3-7 was with seven to nine spacer 
differences more distant to the other eight Antarctic 
strains. Strain H brevicellularis Vischer SAG 835-1 was 
the closest neighboring strain of strain MZ3-7; there were 
just 4 sequence positions different between both strains. 
Group B contained two Antarctic strains (MZ1-3, MZ1-6) 
that had identical spacers; Group B also contained two 
German strains (DB14-15, DB15-5) with identical spacers; 
however, the Antarctic strains differed at 4 positions when 
compared to the German strains. Finally, group D had 
three strains that had only one nucleotide difference, while 
two strains in group F had only two sequence differences. 

ITS2 

Nuclear-encoded ITS2 sequences were determined for 
28 strains as an independent assessment of the plastid- 
encoded sequences. Heterococcus fuornensis strain SAG 
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F H. sp. DB14-l-l_acllO 
F H. sp. DB14-l-l_acl2 
F H. sp. DB14-1-1 
F H. sp. DB14-l-l_aclll 
F H. sp. DB14-l-l_acl4 

F H. sp. DB14-l-l_acll2 
F H. sp. DB14-5-1 




5' 3' 



B 

Figure 4 ITS2 sequence and secondary structure phylogenetic analyses of 28 strains of Heterococcus. (A) ProfDistS [27] sequence- 
structure NJ tree (unrooted) as derived from the multiple sequence-structure alignment of ITS2 helix II. Bootstrap values (100 pseudo-replicates) 
are mapped to the appropriate internodes. Branch lengths are drawn proportional to inferred changes. The template ITS2 variant used in B) is 
highlighted in bold. (B) ITS2 secondary structure of ITS2 variant DB1 4-1 -1 _acl 1 1 (group F, H. ramosissimus) used for homology modeling of helix II 
(shaded) for all strains of Heterococcus. The secondary structure was visualized with VARNA [28]. Helices are numbered I— IV. Typical ITS2 motifs are 
highlighted by filled arrowheads. Open arrowheads mark positions of two CBCs that distinguish groups D (= H. leptosiroides) and E (= H. 
caespitosus). An additional conserved region throughout all strains of Heterococcus is indicated by a cloud (see text). In contrast to the template 
structure the region d1 is deleted in four strains (group D, see Additional file 4). The region d2 is deleted in all other strains not classified in 
group F. (C) Subtree as obtained by using the complete sequence-structure information from helices l-IV (template highlighted in bold). Further 
subtrees as derived using clade specific structural templates (helices l-IV) are provided as Additional files 4, 5, 6. (D) Visualization of the complete 
sequence-structure alignment used to generate the tree as shown in A). Consensus structure (51%) of helix II for all ITS2-sequences obtained 
from the complete multiple sequence-structure alignment without gaps. Sequence conservation is indicated from red (not conserved) to green 
(conserved). Nucleotides which are 100% conserved in all sequences are written as A, U, G or C. Nucleotide bonds which are 100% conserved 
throughout the alignment are marked in yellow. Note the U-U mismatch. The figure was generated with 4SALE [29]. 
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835-5 was successfully amplified and included; however, 
amplification failed for strain MZ1-6 and this strain was 
not included in the ITS2 analyses. Based upon alignment 
similarity, the ITS2 sequences formed the same groups 
that were recovered in the psbA/rbcL spacer analysis; 
therefore, we used the same group notation for both 
datasets. Within a spacer group, the ITS2 sequences and 
their secondary structures were easily aligned and rather 
similar; conversely, between spacer groups, the sequences 
and secondary structures were highly variable, i.e. they 
could be aligned with confidence only for a few short seg- 
ments. The ITS2 sequences exhibited a considerable 
length variation of up to about 130 nucleotides between 
spacer groups. The shortest ITS2 had 285 nucleotides 
(strain SAG 2163 from group C; strain EIF 399/372 from 
group D); the longest sequence had 416 nucleotides 
(strains DB14-1-1, DB 14-5-1 from group F). Within each 
spacer group, the ITS2 sequences were relatively constant 
in length (variation < 10 nucleotides), except for group D 
where sequences were either short (285-287 nucleotides) 
or long (315-319 nucleotides), and the difference was due 
to an indel at the terminal end of helix III in the secondary 
structure model (see below; Additional file 4). The ITS2 
sequence from Heterococcus fuornensis, which had a 
distinctive rbcL gene but could not be amplified for the 
psbA/rbcL spacer, showed little similarity to other spacer 
groups. 

The inferred RNA secondary structures folded into the 
common core structure known for eukaryotes [23] 
which consisted of four helices with the third being the 
longest and helix IV the shortest (Figure 4, Additional 
files 4, 5, 6). Because of the high sequence length vari- 
ation there was not a single ITS2 secondary structure 
from which the secondary structure models of the 
remaining sequences could be deduced using homology 
modeling. Only helix II could be modeled throughout 
the set of sequences independent of the used sequence- 
structure pair. However, within each group complete 
secondary structures could be obtained by homology 
modeling (Figure 4, Additional files 4, 5, 6). Throughout 
the set of sequences, conserved regions were restricted 
to the entire helix II (pos. 86-125 of reference sequence 
H. viridis SAG 835-3, JX681147), which had a constant 
length of 40 nucleotides, and a segment of about 50 
nucleotides (pos. 165-189 and 205-228 of the same 
reference sequence) located at or close to the distal end 
of helix III (Figure 4, Additional files 4, 5, 6). It was fol- 
lowed by an extended terminal end of the helix III of 45 
and 133 nucleotides in spacer groups D and F, whereas 
the corresponding sequence region in other spacer 
groups comprised of six (H. fuornensis strain SAG 835-5, 
no assigned group) to 18 nucleotides (spacer group E). 
That means there was a continuous lengthening/ 
shortening of the ITS2 helix III within Heterococcus 



(Figure 4, Additional files 4, 5, 6). Another conserved 
ITS2 region useful to distinguish groups among Hetero- 
coccus strains was an unpaired sequence segment (-12 
nucleotides) adjacent to helix II (Figure 4; pos. 126-137 of 
reference sequence H viridis SAG 835-3, JX681147). It 
separated H crassulus SAG 835-4, H fuornensis SAG 
835-5, and two clusters of strains from each other. The 
one cluster comprised the strains from groups A-C, the 
other the strains from groups D-F. Within each cluster 
the sequence segments were invariant. 

Multiple copies of ITS2 were recovered in eight strains 
(from groups A, B, D, F and G; Additional file 2), i.e. 
there were no clear sequence reads possible without 
cloning. Four to 12 clones per strain were sequenced 
and this revealed up to seven ITS2 variants per strain 
(Additional file 2). Differences between ITS2 variants 
consisted of one to seven sequence positions and a few 
small indels (< 5 nucleotides); they were mostly located 
in helices I, IV and the basal part of helix III preceding 
the conserved segment. In groups B and D differences 
between ITS2 variants were also located in the con- 
served helix II. In group D three out of the ten detected 
ITS2 variants were lacking the extended 45 nucleotides 
long terminal end of helix III. These shorter variants 
were present in all three strains of group D or in about 
half (10) out of the sequenced 21 clones, while the 
longer ITS2 variants were retrieved only from two 
strains, EIF 423/A790-45 and EIF 128/A788-70. 

The ITS2 phylogenetic analyses were confined to helix 
II (with adjacent unpaired conserved region, pos. 85- 
134 of reference sequence H viridis SAG 835-3, 
JX681147) for assessing relationships among all studied 
strains. The sequence alignment was with 50 positions 
relatively short; it contained no more than 14/9 variable/ 
parsimony informative sites and just nine sequences 
were not identical with others. However, a well-resolved 
phylogeny was obtained when secondary structure was 
considered in addition to primary structure information 
(Figure 4). The resolved helix II sequence groups were 
congruent with the groups recovered in the rbcL phyl- 
ogeny (see spacer group letters on Figure 4). A common 
origin of H crassulus with H fuornensis was well sup- 
ported in the unrooted ITS2 (helix II) phylogeny, and 
this contrasted with the rbcL phylogeny where the rela- 
tionships of both were unresolved (Figure 3). Also an 
unrooted (maximum likelihood) phylogeny of only Het- 
erococcus rbcL gene sequences did not support the com- 
mon origin of both species (not shown). The helix II 
phylogenetic tree resolved a close relationship of groups 
D and E (as the rbcL phylogeny, H caespitosus clade in 
Figure 3), but at the same time both groups were clearly 
separated species because there were two CBCs [23,24] 
in helices II and III (Figure 4, Additional file 4); also 
their helices I and IV could not be aligned. No resolution 
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was provided within the H. viridis clade, i.e. among spa- 
cer groups A, B and C (Figure 4). The complete ITS2 se- 
quence was used to produce phylogenetic trees for 
individual spacer groups or rbcL clades. For example, 
then within the rbcL H viridis clade the spacer groups 
A-C were resolved (Additional file 5). Within group A, 
both variants of strain MZ3-7 shared a common origin 
and were separated from other strains of the group. 
The three authentic strains, Heterococcus viridis SAG 
835-3, H. mainxii SAG 835-6, and H marietanii SAG 
835-7, shared identical ITS2 sequences with each other 
(Additional file 7). Similarly, the ITS2 sequences of the two 
Antarctic strains EIF 398 and EIF PAB 398/473 were 
identical (Figure 2, Additional file 3). Two authentic 
strains, H. brevicellularis SAG 835-1 and H. monilifor- 
mis SAG 835-8, and one unidentified strain (SAG 
56.94) shared identical ITS2 sequences except for a short 
indel (4 nucleotides) in helix IV. Another congruence 
with the chloroplast-encoded data was within group E 
where the ITS2 sequences of two authentic strains, H 
caespitosus SAG 835-2a and H protonematoides SAG 
835-9, were identical. Conversely, within group B no 
differentiation among strains was possible due to the 
extensive radiation of multiple ITS2 variants of strain 
MZ1-3 (Additional file 5). Similarly, group D had ex- 
tensive radiation of ITS2 variants and no relationships 
among strains were resolved (Additional file 4). Here 
the shorter variants of both strains EIF 423/A790-45 
and EIF PAB 399/372 were intermixed among each 
other; they formed two independent lineages distinct 
from a clade comprising the variants with extended 
terminal end of helix III. Within group F no clear 
distinction of the two strains DB 14-1-1 (with multiple 
variants) and DB14-5-1 was provided (Figure 4). 

Discussion 

Monophyletic species concept 

Our results show that morphological features do not 
characterize species; for example, we found that five au- 
thentic culture strains - used in the original descriptions 
for the five species - had nearly identical DNA 
sequences and ITS2 secondary structures. Furthermore, 
we found other examples where authentic strains or 
identified strains were synonymous with another species 
(see below). Almost all Heterococcus species have been 
described using the same morphological approach, we 
have examined all existing authentic culture strains, and 
we find that morphological species descriptions are inad- 
equate for this asexual genus. We conclude that mor- 
phological features characterize only individuals, not 
species. Therefore, we must apply a different species 
concept for Heterococcus, 

The monophyletic species concept' of Johansen and 
Casamatta [18], which is derived from the phylogenetic 



(autapomorphic) species concept' of Mishler and Theriot 
[16,17], is easily applied to asexual species when molecu- 
lar data are available. In our study, the DNA sequences 
and ITS2 secondary structure comparisons recovered a 
clear and robust phylogenetic structure for the 29 Het- 
erococcus strains. Eight groups of sequences were repeat- 
edly recovered using three different molecular markers; 
sequences within each group were very similar or identi- 
cal while those between groups were highly variable. 
Using the monophyletic species concept, we recognize 
these groups as eight distinct species, and we identify 
previously unidentified strains and environmental clones 
to species level. 

In a previous study, the rbcL gene and psbA/rbcL 
spacer were used, in conjunction with the monophy- 
letic species concept, to define species in the Tribone- 
mataceae, another asexual lineage of filamentous 
Xanthophyceae [21]. In that study, strains of the same 
species formed a monophyletic clade in the maximum 
likelihood rbcL gene phylogeny, and strains within the 
same species differed by less than 10 nucleotides. 
Within each species, the psbA/rbcL spacer was easily 
aligned, and within species variation was limited to 
single nucleotide differences and short indels. As with 
our study, the entire spacer could not be aligned be- 
tween species. Therefore, the molecular-based mono- 
phyletic species concept identifies species in the same 
way for both studies. 

The original iconotypes used to nomenclaturally an- 
chor all Heterococcus names consist of ink drawings of 
various morphological features. We have shown that 
these morphological features are not reliable for spe- 
cies identity, and ink drawings are very limited for 
reference. In some cases, neotype material was dried 
and deposited in a herbarium [2], but this too is am- 
biguous because in at least one case, the wrong cul- 
ture was used (see below) and because the material 
does not clearly separate species (still based upon 
morphology). Therefore, the names are herein further 
anchored with epitypes to avoid all ambiguity. The epi- 
types here designated are cryopreserved culture strains 
that can be re-investigated. The nomenclatural details 
are summarized below. 

Taxonomy and nomenclature 

Group A strains differed by no more than five sequence 
positions (one nonsynonymous substitution) in their 
rbcL genes, and the psbA/rbcL spacer regions aligned 
well over their entire lengths, with no more than 11 se- 
quence differences. Their ITS2 sequences also aligned 
well over their entire lengths and there were no more 
than eight ITS2 sequence positions different. Therefore, 
we regard group A as a single species, Heterococcus viri- 
dis, which is the type species for the genus. It is 
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noteworthy that we used Chodats [19] authentic strain, 
SAG 835-3 [3,30]. Group A also contained four add- 
itional species that were based upon authentic strains, 
H. brevicellularis, H. mainxii, H. marietanii and H. mon- 
iliformis [4,5]. We conclude that interpretations of 
largely overlapping morphological features, which 
were used to establish these as separate species, are 
not taxonomically sound; therefore, we consider 
these to be heterotypic synonyms of H. viridis (Figure 2, 
Additional files 3 and 7). Previously, Lokhorst [2] found 
that three of these strains were morphologically al- 
most indistinguishable and he considered them as 
varieties. Group A also includes two strains previously 
identified as H. pleurococcoides Pitschmann [1]. How- 
ever, the two strains were not authentic strains, and we 
cannot completely conclude that H. pleurococcoides is a 
heterotypic synonym of H. viridis. In addition, eight un- 
identified strains are now identified as H. viridis based 
on our study (Figure 2). 

Heterococcus viridis Chodat in Bull. Herb. Boissier, 
ser. 2, 8: p. 81 (1907). 

NEOTYPE: Material (authentic culture strain SAG 
835-3) deposited in Nationaal Herbarium Nederland, 
Leiden University (L) by G.M. Lokhorst in Taxonomic 
Studies in the Genus Heterococcus. Cryptogamie Studies 
Vol. 3, p. 40. (1992). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain SAG 835-3, deposited in the Sammlung 
von Algenkulturen (SAG), Universitat Gottingen, 
Germany. 

Heterotypic synonyms: 

Heterococcus brevicellularis Vischer in Ergeb. Wiss. 
Unters. Schweiz. Nationalparkes, N.F. 1: p. 504; pi. 4, 
figures 1-3; figure 17A, d-f; figure 18. (1945). 

Heterococcus mainxii Vischer in Ber. Schweiz. Bot. Ges. 
47: p. 233; figures 4-6 (1937). 

Heterococcus marietanii Vischer in Ber. Schweiz. Bot. 
Ges. 47: p. 235; figure 7 (1937). 

Heterococcus moniliformis Vischer in Ber. Schweiz. Bot. 
Ges. 47: p. 238; figures 8-9 (1937). 

Heterococcus marietanii Vischer var. moniliformis 
Lokhorst in Taxonomic Studies in the Genus Heterococ- 
cus. Cryptogamie Studies Vol. 3, p. 39. (1992). 

Two authentic strains in group E, H. caespitosus SAG 
835-2a and H. protonematoides SAG 835-9, were identi- 
cal when considering the three markers. We recognize 
group E as a single species. H. caespitosus was described 
first [3], and therefore has nomenclatorial priority over 
H. protonematoides [5], which becomes a heterotypic 
synonym. 

Heterococcus caespitosus Vischer in Ber. Schweiz. 
Bot Ges. 45: p. 391, figures 4-10 (1936). 

ICONOTYPE: Figures 4-10 in Vischer, W. Ber. 
Schweiz. Bot. Ges. 45: 372-410 (1936). 



NEOTYPE: Material (authentic culture strain SAG 
835-9) deposited in Nationaal Herbarium Nederland, 
Leiden University (L) by G.M. Lokhorst in Taxonomic 
Studies in the Genus Heterococcus. Cryptogamie Stu- 
dies Vol. 3, p. 12. (1992). Note: The culture strain used 
to designate the neotype material belonged to Hetero- 
coccus protonematoides, not H. caespitosus; see 
Lokhorst (1992, p. 12). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain SAG 835-2a, deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

Heterotypic synonyms: 

Heterococcus protonematoides Vischer in Ergeb. Wiss. 
Unters. Schweiz. Nationalparkes, N.F. 1: p. 502, pi. 2, 
figures 1-3; figures 15,16. (1945). 

For group G, H. crassulus was represented by an au- 
thentic strain, and we accept this as a recognized spe- 
cies. Similarly, for an unnamed group (see Figure 2), H. 
fuornensis was represented by an authentic strain, and 
therefore we recognize this as a distinct species. 

Heterococcus crassulus Vischer in Ergeb. Wiss. 
Unters. Schweiz. Nationalparkes, N.F. 1: p. 503, pL 3, 
figures 1-3; figures 17, 17A, l-o (1945). 

ICONOTYPE: Figure 17 in Vischer, W. Ergeb. Wiss. 
Unters. Schweiz. Nationalparkes, N.F. 1: 479-512 (1945). 

NEOTYPE: Material (authentic culture strain SAG 
835-4) deposited in Nationaal Herbarium Nederland, 
Leiden University (L) by G.M. Lokhorst in Taxonomic 
Studies in the Genus Heterococcus. Cryptogamie Studies 
Vol. 3, p. 12. (1992). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain SAG 835-4, deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

Heterococcus fuornensis Vischer in Ergeb. Wiss. 
Unters. Schweiz. Nationalparkes, N.F. 1: p. 506, pL 5, 
figures 1-3; figure 17A, a-c; figure 19 (1945). 

ICONOTYPE: Figure 19 in Vischer, W. Ergeb. Wiss. 
Unters. Schweiz. Nationalparkes, N.F. 1: 479-512 (1945). 

NEOTYPE: Material (authentic culture strain SAG 
835-5) deposited in Nationaal Herbarium Nederland, 
Leiden University (L) by G.M. Lokhorst in Taxonomic 
Studies in the Genus Heterococcus. Cryptogamie Studies 
Vol. 3, p. 12. (1992). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain SAG 835-5, deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

In group B, there were no more than nine different se- 
quence positions (one nonsynonymous substitution) 
among the complete rbcL sequences and only four nu- 
cleotide differences among the psbA/rbcL spacers. 
Strains of group B formed a well-supported monophy- 
letic clade independent of other groups/species in the 
rbcL phylogeny (Figure 3, Additional file 1) as well as 
phylogenetic analysis of the whole ITS2 region 
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(Additional file 5). Therefore, we recognize group B as a 
distinct species. Placing a scientific name on group B 
(species B) is problematic because our study included all 
existing authentic cultures. Our molecular data, which 
were rigorously analyzed with phylogenetic methods, 
contradict species distinctions based upon non-rigorous 
intuition using highly variable morphological features, 
and we conclude that our rigorous analyses are more 
scientifically sound. Nonetheless, there are 61 named 
species, and perhaps group (species) B belongs to one of 
those species. If we simply propose a new name, then we 
are defying the intent of the International Code of Bo- 
tanical Nomenclature (or any other Code). Therefore, 
we simply apply four of the oldest names used in [6] for 
group (species) B and the three other groups (C, D, F) 
which contained no authentic strains. We assume that 
none of these names is in contradiction with the morph- 
ology of the strains we designate to represent the four 
species. We argue that establishing axenic cultures and 
examining filaments at a certain age of a culture time (as 
it has been done to define species of Heterococcus previ- 
ously [2-6]) is a poor way to identify species and this 
does not allow field samples to be identified to species. 
With Heterococcus growth in culture is a measure of 
meaningless differences and there is no hope that 
morphology will ever be useful when trying to put a 
name on these four groups (species). We suggest that 
close phylogenetic relationship with defined reference 
(epitype) strains as well as genetic distance from corre- 
sponding strains of other species, evidenced by rbcL 
gene phylogenies and differences in the psbAlrbcL 
spacers are appropriate to identify the species. Secondary 
structure of ITS2 constitutes an additional autapo- 
morphic feature to define species of Heterococcus. We 
use Heterococcus conicus Pitschmann as name for group 
(species) B. 

Heterococcus conicus Pitschmann in Pitschmann, 
H. Nov. Hed. 5 (3/4), p. 498, plate 96, Figures 11-16, 
(1963) 

ICONOTYPES: Plate 96, Figures 11-16, in Pitsch- 
mann, H. Nov. Hed. 5 (3/4), (1963) 

NEOTYPE: Material (culture V 111) deposited in 
Nationaal Herbarium Nederland, Leiden University (L) 
by G.M. Lokhorst in Taxonomic Studies in the Genus 
Heterococcus. Cryptogamie Studies Vol. 3, p. 12. (1992). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain MZ1-3, deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

Group C consisted of a single strain, SAG 2163 
(Figure 2), which formed a distinct lineage in the rbcL 
and full ITS2 phylogenies (Figure 3, Additional files 1 
and 4). It was also distinct in its psbAlrbcL spacer from 
H. viridis and H. conicus which were the closest relatives 
with SAG 2163. Therefore, we recognize group C as a 



distinct species and we use Heterococcus virginis Pitsch- 
mann as name. Two unidentified lichen photobionts 
share identical partial rbcL sequences (JN573801 and 
JN573802; [8]) and these differed by only one nucleotide 
from SAG 2163. Therefore, we assign these lichen 
photobionts to H. virginis as well. 

Heterococcus virginis Pitschmann in Pitschmann, 
H. Nov. Hed. 5 (3/4), p. 497, plate 96, Figures 1-5, 
(1963) 

ICONOTYPES: Plate 96, Figures 1-5, in Pitschmann, 
H. Nov. Hed. 5 (3/4), (1963). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain SAG 2163 deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

Group D comprised three strains with no nucleotide 
difference in the rbcL and a single in the psbhlrbcL spa- 
cer. In the ITS2 phylogeny, the three strains could not 
be distinguished due to different ITS2 variants that are 
intermixed (Additional file 4). Strains of group D exhibit 
a unique ITS2 secondary structure with a rather long 
helix III with considerable length variation at its ter- 
minal end (Additional file 4). Despite being closely 
related to H. caespitosus (group E) in the rbcL phylogeny 
(Figure 3 and Additional file 1) there are two CBCs in 
ITS2 that separate group D strains from the latter spe- 
cies. Consequently, we recognize group D as a distinct 
species, Heterococcus leptosiroides Pitschmann. One en- 
vironmental clone sequence from Antarctic soils 
(AJ580925) shared full sequence identity in rbcL gene 
with group D strains, and therefore, we conclude that 
the environmental clone belongs to H. leptosiroides. 
Group D included strains identified as H. caespitosus 
and H. protonematoides based on morphology; how- 
ever, neither was an authentic culture and again we 
consider identification based on 

Heterococcus leptosiroides Pitschmann in Pitsch- 
mann, H. Nov. Hed. 5 (3/4), p. 497, plate 96, Figures 
6-10, (1963). 

ICONOTYPES: Plate 96, Figures 6-10, in Pitschmann, 
H. Nov. Hed. 5 (3/4), (1963). 

EPITYPE DESIGNATED HERE: Cryopreserved culture 
strain EIF 423/A790-45 deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

Finally, group F contained two strains with fully identi- 
cal complete rbcL sequences and two differences in their 
psbA/rbcL spacers. In the ITS2 phylogeny the two strains 
could not be distinguished due to the variation of mul- 
tiple copies (Figure 4C). The group F strains had a 
unique ITS2 secondary structure with a particularly long 
helix III (Figure 4B). Group F forms an independent 
lineage within the Heterococcus clade in the rbcL phyl- 
ogeny (Figure 3, Additional file 1). Consequently, we 
recognize group F as a distinct species and use Hetero- 
coccus ramosissimus Pitschmann as name. 
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Heterococcus ramosissimus Pitschmann in Pitsch- 
mann, H. Nov. Red. 5 (3/4), p. 499, plate 97, Figures 
1-4, (1963) 

ICONOTYPES: Plate 97, Figures 1-4, in Pitschmann, 
H. Nov. Hed. 5 (3/4), (1963). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain DB14-1-1 deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

The authentic strain of H. endolithicus was described 
by Darling and coworkers [1], 195/A790-35 (accessioned 
as strain SAG 63.90 by the SAG culture collection), but 
our study revealed that it represents a green alga, i.e. a 
close relative of Desmococcus species (Trebouxiophy- 
ceae) (Figure 2). Our microscopic investigation of SAG 
63.90 revealed the same morphology as described previ- 
ously [1]. Significantly, this morphology is somewhat 
similar to the morphology of Desmococcus [10], and this 
makes us confident that SAG 63.90 still represents the 
original isolate. Despite Darling and coworkers [1] hav- 
ing reported a "typical xanthophycean plastid structure" 
based on electron microscopy, they already considered 
H. endolithicus distinct from all other Heterococcus spe- 
cies because it did not form long filaments. In addition, 
three more strains from Antarctic soils were also identi- 
fied as Desmococcus-kke green algae (Figure 2). Therefore, 
we exclude H. endolithicus from the genus Heterococcus 
and propose a new nomenclatural combination for this 
authentic strain, but unfortunately we cannot apply the 
specific epithet {endolithicus) because the name Desmo- 
coccus endolithicus Broady & Ingerfeld already exists [31]. 
Therefore, we propose an avowed substitute name: 

Desmococcus antarctica (Darling & Friedmann) 
Rybalka, Wolf, Andersen & Friedl comb. nov. 

Basionym: Heterococcus endolithicus Darling & Fried- 
mann In Darling et al. / Phycol 23: 599, Figures 2a-c, 3. 
(1987). 

EPITYPE DESIGNATED HERE: Cryopreserved cul- 
ture strain SAG 63.90 deposited in the Sammlung von 
Algenkulturen (SAG), Universitat Gottingen, Germany. 

Infraspecific Variation and Geographical Distribution 

Our relatively small sample of 29 Heterococcus strains 
already showed eight groups (= eight species). Within 
the five species for which multiple strains were available, 
the psbA/rbcL spacer sequences even resolved groups of 
strains with nearly identical sequences (genotypes; 
Figure 2). Strains with identical, or nearly identical, 
sequences were repeatedly found in our relatively small 
sample of Heterococcus strains and, importantly, estab- 
lished at different times from geographically distant lo- 
calities. This implies that the number of species within 
Heterococcus might be rather limited. The same geno- 
types were confined to certain habitats (soil or fresh- 
water) and geographical regions (Europe or Antarctica). 



For example, H viridis strains SAG 835-3, SAG 835-6 
and SAG835-7 were collected from freshwater habitats 
in Europe while all other strains of the species were 
from soil in Europe or Antarctica (Figure 2); they re- 
present a distinct subgroup (genotype) within the spe- 
cies. Similarly, two strains of H conicus were collected 
from freshwater in Europe (DB14-15, DB15-5) whereas 
the other two H conicus strains were collected from 
Antarctic soil (MZ1-3, MZ1-6; Figure 2). We draw two 
conclusions. First, the two species are geographically 
widespread and will grow where suitable habitats exist. 
Second, genotypes of those growing in freshwater are 
distinct from those growing in soil. The sample size is 
exceedingly small, but there is a suggestion that our mo- 
lecular data are separating populations within both spe- 
cies that have distinctly different habitats. 

We also note that half of the Heterococcus genotypes 
in our sample originated from Antarctica but not a sin- 
gle genotype was shared between Antarctic and Euro- 
pean strains, i.e. none of the Antarctic Heterococcus 
strains shared identical psbA/rbcL spacer sequences with 
the European strains. A previous study showed that Ant- 
arctic strains within a single species of the xanthophyte 
Xanthonema were distinguished from their temperate 
counterparts by only few nucleotides for the highly vari- 
able psbA/rbcL spacers [21]. Therefore, our findings for 
Heterococcus support the view that the Antarctic and 
temperate strains of xanthophyte species represent dif- 
ferent populations of a single species. That is, the Ant- 
arctic strains of a given species share their own common 
evolutionary histories, implying that there was only one 
(relatively recent) colonization event in Antarctica for 
each species. Alternatively, if multiple colonization 
events occurred, then the invasions were too recent to 
produce significant divergence [32,33]. 

ITS2 sequence features 

Our ITS2 sequences are, to our knowledge, the first 
ITS2 sequences available for Xanthophyceae. Given that 
available ITS2 sequence information for stramenopile 
algae is still limited, two aspects of the Heterococcus 
ITS2 sequences appear unusual, but might be useful for 
taxonomy. First, in Heterococcus ITS2 lengths were ap- 
proximately 300 nucleotides long in most strains; group 
F (H ramosissimus) sequences were almost 400 bps. The 
average length of ITS2 across all eukaryotes is about 
210 bps as inferred from the ITS2 database IV [34]. In 
group D, two size classes occurred, i.e. either -250 or 
-300 bps, due to a large indel at the terminal end of 
helix III. Other stramenopile algal groups, the Bacillario- 
phyceae and Phaeophyceae, show a bimodal distribution 
of their ITS2 sequence lengths, i.e. around 250/290 bps 
and around 250/350 bps, respectively. Second, the ITS2 
sequences were rather variable, i.e. only few and rather 
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short sequence segments were alignable with confidence 
across the eight Heterococcus species. Such a high se- 
quence variation among species of a single genus is un- 
usual, at least as compared to genera and species of 
green algae where ITS2 has been revealed as a reliable 
molecular marker already many times, e.g. [35-38]. Fi- 
nally, because the ITS2 rDNA sequences were so vari- 
able in Heterococcus, it is not possible to safely define 
compensatory base changes (CBCs), which can be 
deduced only from well aligned sequences. CBCs in con- 
served regions of the helices of ITS2 have been proposed 
for distinguishing microalgal species when sexual 
reproduction is unknown [23,24]. However, the concept 
of CBCs does not imply that two strains lacking CBCs 
must belong to the same species. That is, there may be 
other criteria that define microalgal species. 

The high ITS2 sequence variability is in line with our 
maximum likelihood (GARLI and RAxML) analyses that 
had weak support for the monophyletic origin of the 
genus (Additional file 1). The monophyletic origin of 
Heterococcus was also weakly supported by a multiple 
gene phylogenetic analyses of photosynthetic strameno- 
piles that included three of our Heterococcus species 
[15]. Therefore, our results may suggest that more data 
(and better taxon sampling) are required to firmly dem- 
onstrate the monophyly of Heterococcus, or they may 
suggest that some of the species defined in our study be- 
long to a separate, and sister, genus. 

Conclusions 

Application of the monophyletic species concept using 
the highly variable chloroplast-encoded psbA/rbcL spa- 
cer, the more conserved plastid rbcL gene, and the 
nuclear-encoded ITS2 provided a reference data base for 
unambiguous identification of the common cold soil 
microalga Heterococcus, Eight species were recognized 
and characterized at the molecular level. Previous taxo- 
nomic studies relied entirely on morphological features 
produced in cultures; our data will facilitate diversity 
assessments that are independent of culturing. In 
addition, the PCR amplification approach for the psb A/ 
rbcL spacer is specific for Xanthophyceae. Using the 
new reference data base, partial sequences of the psbAI 
rbcL spacer and/or ITS2 may already be sufficient for 
the assignment of a new strain to a certain species. 
There are some difficulties; amplification of the psbAI 
rbcL spacer may be hampered by length variations, and 
sequence analyses of ITS2 may be complicated by mul- 
tiple variants per strain. Using the monophyletic species 
concept, our species are mostly in contrast to those 
defined by the morphological (typological) species con- 
cept. We conclude that the extensive morphological 
plasticity displayed in culture cannot be interpreted 
without rigorous methods (e.g. cladistics), and the largely 



overlapping morphological characteristics make cladistic 
analysis very difficult or impossible. The identical, but 
highly variable, sequences that were repeatedly recovered 
among the species, suggest that the species diversity of 
Heterococcus is not extensive, especially considering the 
repetition that occurred in our small sampling from 
Europe and Antarctica. The observed sequence changes 
within a species may reflect adaptations to different 
types of habitats or climates and distinguish geographic- 
ally widely separated strains. 

Methods 

Culture strains 

Twenty three culture strains were received from the 
SAG culture collection [39,40]; five strains were pro- 
vided by other workers in the field. Another five isolates 
(strains MZ1-3, MZ1-6, MZ2-4, MZ2-5, MZ3-7) were 
newly established using methods described previously 
[21] from Antarctic soil samples, i.e. the forefield of 
Baranowski Glacier, King George Island (collected De- 
cember 12 2008 by M. Olech). Strains MZ1-3 and MZ1-6 
were from the same sample, about 5 m from the glacier 
(62°12'34,9"S, 58°2655,7"W) at 10 m a.s.l. Strains MZ2-4 
and MZ2-5 also were from a single sample, a frontal mo- 
raine (62 0 1234.4"S- 58°2650.2"W) at 16 m a.s.l. Strain 
MZ3-7 was from a basal moraine (62 0 12 33.4"S - 
58 0 2641.1"W) at 6 m a.s.l. Antarctic strain B10 (provided 
by A. Massalski) was isolated also from King George Is- 
land, but from transect B near Ecology Glacier about 
370 m farther inland; [41]). Four isolates (DB 14-1-1, 
DB14-5-1, DB14-15 and DB15-5; provided by K. M. 
Mohr) were from cyanobacteria-dominated biofilms cov- 
ering rocks at two neighboring locations of the main 
spring of the tufa-forming karst-water creek, Deinschwan- 
ger Bach, located at the western margin of the Franconian 
Alb, approximately 30 km ESE of Nurnberg, Germany 
(49°23'N, 11°28'E) [42]. The ten new isolates have been 
accessioned by the SAG culture collection under strain 
numbers as given in Additional file 2. 

DNA extraction, PCR amplification and sequencing of 
strains 

DNA was isolated from fresh cultures as in [21]. For de- 
termining sequences of the plastid-encoded psbA/rbcL 
spacer which lies upstream of the rbcL gene, i.e. between 
the psbA and rbcL genes, and full-length sequences of 
the rbcL gene the PCR approach of Andersen and Bailey 
[20] modified to amplify the target sequence in one 
piece [21] was used. The 5' primer psb A5 [20] or Xan2F 
[21], anchored in the psb A gene, and the 3' primer RS3 
[20] placed in rbcS (downstream of rbcL) were used. 
However, for strains with extremely long psbAlrbcL 
spacers, PCR amplification was in two overlapping frag- 
ments, i.e. with primer pairs psbAS and X5RG (the 
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reverse complement of primer X5FG [21]) and Xan3F 
[21] and RS3 [20]. For amplification of ITS2, PCR pri- 
mers Xits2F (5' -GCTACACTCTGACACCTG -3'; 
which binds at the 5'-end of the 18S rRNA gene, i.e. pos. 
1462-1477 of reference sequence AM490822 H viridis 
SAG 835-3, and LR1850 [43] were used to amplify a 
rDNA fragment that expanded from 3'-end of SSU 
downstream to the 5'-end of the LSU rDNA. The same 
cycling parameters were used for all PCR reactions as 
described previously [21]. PCR products were purified 
using Invisorb Spin PCRapid Kit (Invitek, Berlin, 
Germany) or MSB Spin PCRapace Kit (Invitek, Berlin, 
Germany). Sequence determination of the psbA/rbcL 
spacer was as previously [21], but complemented by nine 
additional primers to obtain the sequences of the ex- 
tremely long psbAlrbcL spacers present in some Hetero- 
coccus strains, i.e. hetnew_F (5'-GGTACAACTGAY 
CAATT-3'), het_F (5 -GGTGGTACAATTGGYCATC 
CAGA-3'), spacer2R (5 -ATTCGAGTACGCTCTTGTA- 
3'), DB_F (5-GGCAAGCCTTTCACTCTTGAT-3'), 
DB_R (5'-CCACCCGGATTTAAAAGAGTT-3'), DB_F2 
(S'-TTCGATACGGGAAACAACTT-S'), DB_R2 (5 -G 
ATCCTTTGGTTCAACTTAGAAGA-3'), SAG_F (5-C 
AAGCTTCGACTGAGGCTT-3'), and SAG_R (5-AT 
TGCAAGGCAAGCCTTG-3'). The latter two sequen- 
cing primers were used only for H. crassulus strain SAG 
835-4, the "DB" primers only with the two isolates 
DB14-1-1 and DB14-5-1. The rbcL sequences were 
checked against the NCBI gene sequence database using 
nucleotide BLAST (blastn) [44,45] to confirm that they 
were Xanthophyceae. For four strains from the SAG cul- 
ture collections no PCR products of plastid-encoded 
markers as described above could be obtained and then 
a portion of the nuclear-encoded 18S rRNA gene was 
sequenced with primer 895R [46] after PCR amplifica- 
tion with primers preferentially binding to green algal 
rDNA, primers 20 F [8] and CH1750R [46] and checked 
against the NCBI gene sequence database. For ITS2 se- 
quence determination, the sequencing primers were 
5.8SbF and 5.8SbR [47], 1800 F [43] and ITS4Xan 
(5'-TCCTCCGCTTAGTTATATGC-3'), which was a modi- 
fication of primer ITS4 [48]. In several cases no clear 
sequence reads were obtained, even after repeated PCR 
and sequencing attempts, due to multiple copies of the 
ITS2 which varied in primary sequences (see Results). 
Then cloning of the PCR products was performed with the 
TOPO TA cloning kit and the pCR2.1-TOPO vector (Invi- 
trogen, Carlsbad, CA, USA). Ligations were transformed 
into competent E. coli TOP 10 cells as supplied by the 
manufacturer. In the plasmid screening, white E. coli col- 
onies containing correct DNA insertions were identified by 
direct amplification of the inserted DNA fragment with a 
vector-specific primer set M13F/M13R. The ITS fragments 
were re-amplified from M13F/M13R PCR products with 



primer pair Xits2F/LR1850 as described above or the 
clones were cultivated overnight in LidBac reaction tubes 
(Qiagen, Hilden, Germany) with 1 ml LB medium contain- 
ing 100 ug ampicillin and plasmid DNA was prepared from 
the clones with a NucleoSpin- Plasmid kit (Macherey and 
Nagel, Duren, Germany) following manufacturers instruc- 
tions. Sequencing reactions were performed with the Dye 
Terminator Cycle Sequencing v3.1 kit (Applied Biosystems, 
Darmstadt, Germany) and separated on an ABI Prism 
3100 (Applied Biosystems, Darmstadt, Germany) sequen- 
cer. The sequences were assembled using the program 
SeqAssem [49]. For GenBank accession numbers of newly 
determined sequences for the 29 Heterococcus strains see 
Additional file 2; the accession numbers for the four green 
algal sequences determined in this study are JX681197 - 
JX681200. 

Chloroplast-encoded marker analysis 

The chloroplast-encoded marker sequences (from 3'-end 
of psbA downstream to 5'-end of rbcS) were manually 
aligned using Bioedit [50] and Seaview [51] editors from 
which the rbcL sequence alignment used for the phylo- 
genetic analyses was extracted. The rbcL sequence align- 
ment was constructed using 15 of the sequences newly 
determined for Heterococcus in this study to which 32 
other sequences available for the Xanthophyceae clades 
C, B, T, and V as defined previously [14] were ad- 
ded (Additional file 1). The two phaeophycean sequen- 
ces Fucus vesiculosus NC016735 and Ectocarpus sp. 
AY372978 were employed to root the phylogeny. The 
alignment was subjected to distance, maximum-parsimony 
(MP) and maximum-likelihood (ML) approaches. ModelT- 
est 3.7 [52] used in conjunction with PAUP* 4bl0 [53] 
determined that the GTR+I+G model [54] provided the 
best fit to the data according to the AIC criterion with esti- 
mations of nucleotide frequencies (A = 0.2859, C = 0.1447, 
G = 0.1981, T = 0.3714), a rate matrix with six different 
substitution types, assuming a heterogeneous rate of sub- 
stitutions with a gamma distribution of variable sites, num- 
ber of rate categories = 4, shape parameter a = 0.8249 and 
proportion of invariable sites (pinvar) of 0.4977. This 
model was used for the minimum evolution distance (ME) 
approach performed with PAUP* 4bl0 (DNA distances set 
to maximum likelihood) and the maximum likelihood ML 
(approach) using GARLI v0.96 [25,26]. A complementary 
ML phylogeny construction was done with the program 
RAxML [55], using the GTR+r+I model and with 100 
bootstrap replicates. Neighbor-joining distance (NJ) phylo- 
genies were constructed in connection with the "HKY85 
model" [56] with PAUP* 4b 10. For ME and maximum par- 
simony (MP) tree reconstruction (PAUP* 4bl0) a heuristic 
search procedure with 10 random input orders of 
sequences and TBR were employed to find the best tree. 
Best scoring trees were held at each step. In MP analyses, 
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the sites were weighted (RI over an interval of 1-1000). 
Bootstrap resampling was performed on NJ, ME, MP with 
1000 replications and 2000 replications on ML GARLI 
trees. For the Bayesian analysis the program MrBayes ver- 
sion 3.1.2 [57] was used with procedures as described earl- 
ier [58]. 

Nuclear-encoded ITS2 sequence-structure analysis 

Using hidden Markov models (HMMs) nuclear ITS2 
sequences have been annotated according to [59]. One ITS2 
sequence from each group A, D, F and H. fuornensis strain 
SAG 835-5 was used for secondary structure prediction. 
Based on minimum free energy ITS2 secondary structures 
were directly folded with the help of the "RNAstructure" 
software [60,61] and manually corrected. The four 
sequence-structure pairs were used as templates for hom- 
ology modeling of the remaining 39 secondary structures 
[62]. In accordance to [63] the phylogenetic analysis fol- 
lowed the procedure outlined in [23,34,64,65]: automatically, 
a multiple sequence-structure alignment was generated in 
4SALE vl.7 [29,66], i.e. either partial (Figure 4) or full 
(Additional files 4, 5, 6) sequences and their secondary 
structures were synchronously aligned, making use of an 
ITS2 sequence-structure specific scoring matrix [66,67]. 
Based simultaneously on the primary sequence and the sec- 
ondary structure information, phylogenetic relationships 
were reconstructed using NJ through in conjunction with 
an ITS2 sequence-structure specific general time revers- 
ible (GTR) substitution model as implemented in Prof- 
DistS vO.9.9 [27,67]. Bootstrap support [68] was estimated 
based on 100 pseudo-replicates (Figure 4, Additional files 
4, 5, 6). Trees were visualized using Treeview [69]. 

Additional files 



Additional file 1: Maximum likelihood (ML) phylogeny of rbcl gene 
sequences for Heterococcus and other members of Xanthophyceae. 

The phylogeny was calculated with the programme GARLI v0.96 [25,26] 
based on a rbcl data set (1325 bp long, 517/418 variable/parsimony 
informative sites) consisting of 15 Heterococcus and 32 other 
Xanthophyceae sequences (corresponding to clades C, B, T, and V as 
defined in [14]) as well as two sequences from Phaeophyceae as 
outgroup. Scale bar, substitution per site. Numbers mapped to 
internodes are bootstrap values from 2000 replicates, only values >70% 
have been recorded. The phylogeny in this Figure includes the 
phylogeny of 15 Heterococcus strains shown in Figure 3 (highlighted). The 
inserted table lists bootstrap values mapped to internodes of the 
Heterococcus clade using six different analysis methods (see text). Scale 
bar, substitution per site. 

Additional file 2: DNA sequences newly determined for 29 
Heterococcus strains and their GenBank sequence accession 
numbers. For the psbA/rbcl spacer and full rbcl gene all determined 
sequences are listed, for ITS2 only those sequences that were different 
from each other, (p), only psbA/rbcl spacer and partial full rbcl gene 
could be determined; (a), already made available previously; n.a., not 
applicable. 

Additional file 3: Groups of Heterococcus strains with fully identical 
rbcl and/or psbNrbcl spacer sequences. Strains marked in bold were 



used for the rbcl phylogeny (Figure 3, Additional file 1). Species 
assignment is according to the new species designation as in Figure 2 
(see Discussion). 

Additional file 4: ITS2 sequence and secondary structure 
phylogenetic analyses of three strains of Heterococcus group D (H. 
leptosiroides). (A) ProfDistS [27] sequence-structure NJ tree (unrooted) as 
derived from the multiple sequence-structure alignment of ITS2 helices I- 
IV recovered for strains of group D, H. leptosiroides. Bootstrap values 
based on 100 pseudo-replicates are mapped to the appropriate 
internodes. Branch lengths are drawn proportional to inferred changes. 
The template ITS2 variant used in B) is highlighted in bold. Scale bar, 
substitutions per site. (B) ITS2 secondary structure of ITS2 variant EIF 423/ 
A790-5_cl65 used for homology modeling of secondary structures for all 
strains of group D (H. leptosiroides). The secondary structure was 
visualized with VARNA [28]. Helices are numbered l-IV. Four strains 
indicated by an asterisk are devoid of the apical part of helix III. An 
arrowhead indicates the highly conserved GGU motif 5' to the apex of 
helix III. A cloud highlights the segment of helix III conserved across all 
studied strains. Open arrowheads mark positions of two CBCs that 
distinguish groups D and E. 

Additional file 5: ITS2 sequence and secondary structure 
phylogenetic analyses of twelve strains of Heterococcus groups A-C 

(H. viridis, H. conicus, H. virginis). (A) ProfDistS [27] sequence-structure 
NJ tree (unrooted) as derived from the multiple sequence-structure 
alignment of ITS2 helices l-IV recovered for strains of the H. viridis clade, i. 
e. groups A-C, H. viridis, H. conicus and H. virginis. Bootstrap values based 
on 100 pseudo-replicates are mapped to the appropriate internodes. 
Branch lengths are drawn proportional to inferred changes. The template 
ITS2 variant used in B) is highlighted in bold. Scale bar, substitutions per 
site. (B) Secondary structure of ITS2 variant H. viridis EIF 430/A801-2_24 
used for homology modeling of secondary structures for all strains of 
Heterococcus groups A-C. The secondary structure was visualized with 
VARNA [28]. Helices are numbered l-IV. An arrowhead indicates the 
highly conserved GGU motif 5' to the apex of helix III. A cloud highlights 
the segment of helix III conserved across all studied strains. 

Additional file 6: ITS2 sequence and secondary structure 
phylogenetic analyses of Heterococcus fuornensis strain SAG 835-5. 

(A) ProfDistS [27] sequence-structure NJ tree (unrooted) of ITS2 variants 
recovered from strain H. fuornensis SAG 835-5 as derived from the 
multiple sequence-structure alignment of ITS2 helices l-IV. Bootstrap 
values based on 100 pseudo-replicates are mapped to the appropriate 
internodes. Branch lengths are drawn proportional to inferred changes. 
The template ITS2 variant used in B) is highlighted in bold. Scale bar, 
substitutions per site. (B) Secondary structure of ITS2 variant SAG 835-5 
_46 used for homology modeling of secondary structures for all ITS2 
variants of the same strain. The secondary structure was visualized with 
VARNA [28]. Helices are numbered l-IV. An arrowhead indicates the 
highly conserved GGU motif 5' to the apex of helix III. A cloud highlights 
the segment of helix III conserved across all studied strains. 

Additional file 7: DNA sequence differences among five authentic 
strains of Heterococcus group A. Distance matrices with number of 
sequence position differences from the rbcl gene, the psbA/rbcl spacer 
and ITS2 between the five authentic strains of Heterococcus group A 
(assigned to H. viridis , see text). In brackets, the total number of 
differences found with a certain molecular marker among the five strains. 
An asterisk marks the strain that is distinct from others by the presence 
of a "GCAA" indel in helix IV of ITS2. 
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